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Abstract 

The maximum a posteriori (MAP) configu- 
ration of binary variable models with sub- 
modular graph-structured energy functions 
can be found efficiently and exactly by graph 
cuts. Max-product belief propagation (MP) 
has been shown to be suboptimal on this class 
of energy functions by a canonical coiuitcrex- 
ample where MP converges to a suboptimal 
fixed point (Kulesza & Pereira, 2008). 

In this work, we show that under a partic- 
ular scheduling and damping scheme, MP 
is equivalent to graph cuts, and thus opti- 
mal. We explain the apparent contradiction 

by showing that with proper scheduling and 
damping, MP always converges to an optimal 
fixed point. Thus, the canonical counterex- 
ample only shows the suboptimality of MP 
with a particular suboptimal choice of sched- 
ule and damping. With proper choices, MP 
is optimal. 



1 Introduction 

Maximum a posteriori (MAP) inference in probabilis- 
tic graphical models is a fundamental machine learn- 
ing task with applications to fields such as computer 
vision and computational biology. There are various 
algorithms designed to solve MAP problems, each pro- 
viding different problem-dependent theoretical guar- 
antees and empirical performance. It is often difficult 
to choose which algorithm to use in a particular ap- 
plication. In some cases, however, there is a "gold- 
standard" algorithm that clearly outperforms compet- 
ing algorithms, such as the case of graph cuts for bi- 
nary submodular problems.^ A popular and more gen- 



eral, but also occasionally erratic, algorithm is max- 
product belief propagation (MP). 

Our aim in this work is to establish the precise re- 
lationship between MP and graph cuts, namely that 
graph cuts is a special case of MP. To do so, we map 
analogous aspects of the algorithms to each other: mes- 
sage scheduling in MP to selecting augmenting paths 
in graph cuts; passing messages on a chain to pushing 
fiow through an augmenting path; message damping 
to limiting flow to be the bottleneck capacity of an 
augmenting path; and letting messages reinforce them- 
selves on a loopy graph to the graph cuts connected 
components decoding scheme. 

This equivalence implies strong statements regarding 
the optimality of MP on binary submodular energies 
defined on graphs with arbitrary topology, which may 
appear to contradict much of what is known about 
MP — all empirical results showing MP to be subopti- 
mal on binary submodular problems, and the theoret- 
ical results of Kulesza and Pereira (2008); Wainwright 
and Jordan (2008) which show analytically that MP 
converges to the wrong solution. We analyze this is- 
sue in depth and show there is no contradiction, but 
implicit in the previous analysis and experiments is a 
suboptimal choice of scheduling and damping, leading 
the algorithms to converge to bad fixed points. Our 
results give a more complete characterization of these 
issues, showing (a) there always exists an optimal fixed 
point for binary submodular energy functions, and (b) 
with proper scheduling and damping MP can always 
be made to converge to an optimal fixed point. 

The existence of the optimal MP fixed point can alter- 
natively be derived as a consequence of the analysis of 
the zero temperature limit of convexified sum-product 
in Weiss et al. (2007) along with the well-known fact 
that the standard linear program relaxation is tight 
for binary submodular energies. Our proof of the ex- 



^From here on, wc drop "graph-structured" and refer 
to the energy functions just as binary submodular. Unless 



explicitly specified otherwise, though, we always assume 
that energies are defined on a simple graph. 



istcncc of the fixed point, then, is an alternative, more 
direct proof. However, we beheve our construction of 
the fixed point to be novel and significant, particularly 
due to the fact that the construction comes from sim- 
ply running ordinary max-product within the standard 
algorithmic degrees of freedom, namely damping and 
scheduling. 

Our analysis is significant for many reasons. Two of 
the most important are as follows. First, it shows 
that previous constructions of MP fixed points for bi- 
nary submodular energy functions critically depend on 
the particular schedule, damping, and initialization. 
Though there exist suboptimal fixed points, there also 
always exist optimal fixed points, and with proper 
care, the bad fixed points can always be avoided. Sec- 
ond, it simplifies the space of MAP inference algo- 
rithms, making explicit the connection between two 
popular and seemingly distinct algorithms. The map- 
ping improves our understanding of message schedul- 
ing and gives insight into how graph cut-like algo- 
rithms might be developed for more general settings. 

2 Background and Notation 

We are interested in finding maximizing assign- 
ments of distributions P{x) a e"-^'^^ where x = 
{xi, . . . , xm} G {0, l}*'^. We can equivalently seek to 
minimize the energy E, and for the sake of exposition 
we choose to present the analysis in terms of energies^. 

Binary Submodular Energies: We restrict our 

attention to submodular energy functions over binary 
variables. Graph- structured energy functions are de- 
fined on a simple graph, Q = {V,£), where each node 
is associated with a variable x. Potential functions Qi 
and Qij map configurations of individual variables and 
pairs of variables whose corresponding nodes share an 
edge, respectively, to real values. We write this energy 
function as 



ijee 



{Xi,Xj). (1) 



E is said to be submodular if and only if for all ij G £, 

e,:,(0,0) + e.y(f,l) < e.y(0,l) + ejj(l,0). We use the 
shorthand notation [9f,el] = [ei(0), e,(i)]. 

When E is submodular, it is always possible to repre- 
sent all pairwise potentials in the canonical form 



e„(o,o) e,j(o,f) 

6^,(1,0) 6^,(1,1) 



e° 
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^This makes "max-product" a bit of a misnomer, since 
in reality, we will be analyzing min-sum belief propagation. 
The two are equivalent, however, so we will use "max- 
product" (MP) throughout, and it should be clear from 
context when we mean "min-sum" . 



with , > without changing the energy of any 
assignment. We assume that energies are expressed in 
this form throughout.^ In our notation, O^j and 
refer to the same quantity. 



2.1 Graph Cuts 

Graph cuts is a well-known algorithm for minimiz- 
ing graph-structured binary submodular energy func- 
tions, which is known to converge to the optimal so- 
lution in low-order polynomial time by transformation 
into a max;imum network flow problem. The energy 
function is converted into a weighted directed graph 
gC = {V'^^,£^^, C), where C is an edge function that 
maps each directed edge (z, j) G 5*^^ to a non-negative 
real number representing the initial capacity of the 
edge. One non-terminal node Vi G V"^*" is constructed 
for each variable Xi G V, and two terminal nodes, a 
source s, and a sink t, are added to V*^*^. Edges in 
£ are mapped to two edges in S'^'-^ , one per direction. 
The initial capacity of the directed edge € 5"^^ 
is set to d^j, and the initial capacity of the directed 
edge € f*^*" is set to Ojj. In addition, directed 
edges are created from the source node to every non- 
terminal node, and from every non-terminal node to 
the sink node. The initial capacity of the terminal 
edge from s to Vi is set to be 9j , and the initial capac- 
ity of the terminal edge from Vi to t is set to be 6*?. We 
assume that the energy function has been normalized 
so that one of the initial terminal edge capacities is 
for every non-terminal node. 

Residual Graph: Throughout the course of an 
augmenting paths-based max-flow algorithm, residual 
capacities (or equivalently hereafter, capacities) are 
maintained for each directed edge. The residual capac- 
ity is the amount of flow that can be pushed through an 
edge either by using unused capacity or by reversing 
flow that has been pushed in the opposite direction. 
Given a flow of fij from Vi to Vj via edge and 
a flow of fji from vj to Vi via edge (j, i), the residual 
capacity is r^^ = — fij + fji. An augmenting path 
is a path from s to t through the residual graph that 
has positive capacity. We call the minimum residual 
capacity of any edge along an augmenting path the 
bottleneck capacity for the augmenting path. 

Two Phases of Graph Cuts: Augmenting path al- 
gorithms for graph cuts proceed in two phases. In 
Phase 1, flow is pushed through augmenting paths un- 
til all source-connected nodes (i.e., those with an edge 
from source to node with positive capacity) are sep- 
arated from all sink-connected nodes (i.e., those with 
an edge to the sink with positive capacity). In Phase 



^See Kolmogorov and Zabih (2002) for a more thorough 
discussion of representational matters. 



2, to determine assignments, a connected components 
algorithm is run to find all nodes that are reachable 
from the source and sink, respectively. 

Phase 1 — Reparametrization: The first phase 

can be viewed as reparametcrizing the energy function, 
moving mass from unary and pairwise potentials to 
other pairwise potentials and from unary potentials 
to a constant potential (Kohli & Tori, 2007). The 
constant potential is a lower bound on the optimum. 

We begin by rewriting (1) as 

+ - Xj) + ^consU (2) 

where we added a constant term 6 const-, initially set to 
0, to £^(x; 0) without changing the energy. 

A reparametrization is a change in potentials from O 
to 6 such that i?(x; 9) = -^(x; 0) for all assignments 
X. Pushing flow corresponds to factoring out a con- 
stant, /, from some subset of terms and applying the 
following algebraic identity to terms from (2): 

/ • [Xi + (1 - Xi)x2 + . . . + (1 - XN-i)xn + (1 - Xn) 

= / • [a;i(l -X2) + ■■■+ XAr_i(l - Xn) + 1] . 

By ensuring that / is positive (choosing paths that 
can sustain flow), the constant potential can be made 
to grow at each iteration. When no paths exist with 
nonzero /, 9const is the optimal energy value (Ford & 
Fulkerson, 1956). 

In terms of the individual coefficients, pushing flow 
through a path corresponds to reparametcrizing en- 
tries of the potentials on an augmenting path: 



0% 

^ const 



f 



0%-f 

^ const ~l~ /• 



(3) 
(4) 

for all ij on path (5) 
for all ij on path (6) 
(7) 



Phase 2 Connected Components: After no more 
paths can be found, most nodes will not be directly 
connected to the source or the sink by an edge that 
has positive capacity in the residual graph. In order 
to determine assignments, information must be prop- 
agated from nodes that are directly connected to a 
terminal via positive capacity edges via non-terminal 
nodes. A connected components procedure is run, and 
any node that is (possibly indirectly) connected to the 
sink is assigned label 0, and any node that is (possi- 
bly indirectly) connected to the source is given label 



1. Nodes that are not connected to either terminal 
can be given an arbitrary label without changing the 
energy of the configuration, so long as within a con- 
nected component the labels are consistent. In prac- 
tice, terminal-disconnected nodes are typically given 
label 0. 

2.2 Strict Max-Product Belief Propagation 

Strict max-product belief propagation (Strict MP) is 
an iterative, local, message passing algorithm that can 
be used to find the MAP configuration of a distribution 
specified by a tree-structured graphical model. The al- 
gorithm can equally be applied to loopy graphs. Em- 
ploying the energy function notation, the algorithm is 
usually referred to as min-sum. Using the factor-graph 
representation (Kschischang et al., 2001), the iterative 
updates on simple graph-structured energies involves 
sending messages from factors to variables 

■me^^a^Axi) = Qi{xi) (8) 
me^.^a:^{xj) =mm[eij(xi,Xj) + m^^i^-ey (2;^)] (9) 

and from variables to factors, ma;._,.0y (xj) = 
X]i'GA^(i)\{j} "^ei,.^2;,('^j), where 7V(i) is the set of 
neighbor variables of i in C/. In Strict MP, we require 
that all messages are updated in parallel in each it- 
eration. Assignments are typically decoded from be- 
liefs as Xi = argmin^;; bi{xi), where bi{xi) = &i{xi) + 
'^j^j^(^i^fnQ..^xi{xi)- Pairwise beliefs are defined as 
bij(xi,Xj) = Qij{xi,Xj)-\-Tnxi—>-0ij{xi)-\-Tnxj—>-0ij{xj).^ 

2.3 Max-Product Belief Propagation 

In practice, Strict MP does not converge well, so a 
combination of damping and asynchronous message 
passing schemes is typically used. Thus, MP is ac- 
tually a family of algorithms. We formally define the 
family as follows: 

Definition 1 (Max-Product Belief Propagation). MP 
is a message passing algorithm that computes mes- 
sages as in (9). Messages may be initialized arbitrar- 
ily, scheduled in any (possibly dynamic) ordering, and 
damped in any (possibly dynamic) manner, so long as 
the fixed points of the algorithm are the same as the 
fixed points of Strict MP. 

We believe this definition to be broad enough to con- 
tain most algorithms that are considered to be max- 
product, yet restrictive enough to exclude e.g., fun- 
damentally different linear program-based algorithms 
like tree-reweighted max-product. 



''Note that we only need message values to be correct up 
to a constant, so it is common practice to normalize mes- 
sages and beliefs so that the minimum entry in a message 
or belief vector is 0. 



There has been much work on scheduhng messages, 
including a recent string of work on dynamic asyn- 
chronous scheduling (Elidan et al., 2006; Sutton & 
McCallum, 2007), which shows that adaptive sched- 
ules can lead to improved convergence. An equally im- 
portant practical concern is message damping. Dueck 
(2010), for example, discusses the importance of damp- 
ing in detail with respect to using MP for exemplar- 
based clustering (affinity propagation). Our definition 
of MP includes these variants. 

2.4 Augmenting Path = Chain Subgraph 

Our scheduling makes use of dynamically chosen 
chains, which are analogous to augmenting paths. For- 
mally, an augmenting path is a sequence of nodes 

T = {S, VT,,VT2,---, t^r„-i , VT^ , t) (10) 

where a nonzero amount of flow can be pushed through 
the path. It will be useful to refer to £^'-'{T) as the 
set of edges encountered along T. 

Let X7- C X be the variables corresponding to non- 
terminal nodes in T. The potentials corresponding to 
the edges £'~^'~'{T) and the entries of these potentials 
are denoted by 67- and a subset of potential values 
9^-. Formally, 

X7- ={xri, . . • ,a;r„} 
Or =er, u {er„r,+ jr=i' u er„ 
0r=e}r,u{0'^^^r.^J--,'u0l. (11) 

Note that there are only two unary potentials on a 
chain corresponding to an augmenting path, which cor- 
respond to terminal edges in £^'~''{T). It will be useful 
to map edges in S'^'''' {T) to edges in the equivalent fac- 
tor graph representation. We use £^'~^{T) to denote 
all edges in Q between potentials in ©7- and variables 
in X7-. 

As an example, an augmenting path T = 
{s,Vi,Vj,Vk,t) in the graph cut formulation would 
be mapped to X7- = {xi,Xj,Xk}, ©7- = 
{Qi, Qij,&jk, Ok}, and 0t = {0],e°l,0%, e^}. 

3 Augmenting Paths Mctx-Product 

In this section, we present Augmenting Paths Max- 
Product (APMP), a particular scheduling and damp- 
ing scheme for MP, that — like graph cuts — has two 
phases. At each iteration of the first phase, the sched- 
uler returns a chain on which to pass messages. Hold- 
ing all other messages fixed, messages are passed for- 
ward and backward on the chain, with standard mes- 
sage normalization applied, to complete the iteration. 



Adaptive message damping applied to messages leav- 
ing unary factors (described below) ensures that mes- 
sages propagate across the chain in a particularly 
structured way. Messages leaving pairwise factors and 
messages from variables to factors are not damped. 
Phase 1 terminates when the scheduler indicates there 
are no more messages to send, then in Phase 2, Strict 
MP is run until convergence (we guarantee it will con- 
verge). The full APMP is given in Algorithm 1. 

3.1 Phase 1: Path Scheduling and Deunping 

For convenience, we use the convention that chains go 
from "left" to "right," where the left-most variable on 
a chain corresponding to an augmenting path is X7-1, 
and the right-most variable is xj-^^ . In these terms, a 
forward pass is from left to right, and a backward pass 
is from right to left. 

Suppose that at the end of iteration 1, the outgoing 

message from the unary factor at the start of the chain 
used in iteration t, T = T{t), is rn^^^}^^^ i^Ti) = 
(0,6)"^. If the factor increments its outgoing message 
in such a way as to guarantee that b + f < 9^^ for all 
steps along T, the messages as shown in Fig. 1 will be 
computed (see Corollary 1 below). Later analysis will 
explain why this is desirable. Accounting for message 
normalization, this can be accomplished by limiting 
the change Am,{.) = m^*-'(.) — m^*~^\.) in outgoing 
message from the first unary variable on a path to be 
Ame^^^rrn (^Ti ) ~ (O^/)"^- We also constrain the 
increment in the backward direction to equal the in- 
crement in the forward direction. 

Under the constraints, the largest / we can choose is 
/ = min [e'r. - mtll.r. (0), - m^Xn 
min \9l'-rn^^-_ll.{l) + m'^^-_ll.{Q)\) (12) 

which is exactly the bottleneck capacity of the corre- 
sponding augmenting path. In other words, limiting 
the change in outgoing message value from unary fac- 
tors to be the bottleneck capacity of the augmenting 
path will ensure that messages increments propagate 
through a chain unmodified-that is, when one variable 
on the path receives an increment of (0, f)^ as Xi does 
in Fig. 1(b), it will propagate the same increment to 
the next variable on the path (xj), as in Fig. 1(c). This 
is proved in Lemma 1. 

Damping: The key, simple idea to the damping 

scheme is that we want unary factors to increment 
their messages by the bottleneck capacity of the cur- 
rent chain. The necessary value of / can be achieved 
by damping the outgoing message from the first and 
last unary potential on each chain. For the first unary 




(c) (d) 

Figure 1: The pairwise potential is in square brackets. 

Only messages changed relative to previous subfigurc 
are shown in parentheses. Let a = ai + 02 + 03 and 
6 = 61 + 62 + &3- (a) Start of iteration. The capac- 
ity of the edge ij is /. (b) Inductive assumption that 
each node on the augmenting path will receive a mes- 
sage increment of (0, /) from the left-neighbor, (c) 
Passing messages completes the inductive step where 
Xj receives an incremented message, (d) Similarly, re- 
ceiving an incremented message in the backwards di- 
rection then updating messages from j to i completes 
the iteration. 

factor, if we previously have message (0, b)^ on the 

edge, then to produce message (0, b + f)'^, we can ap- 
ply damping Xj-.^{t) where XTi{t) is chosen by solving 
the equation: 

Ar, {t)-b + {l- Ar, (<)) • 4, = / + (13) 

yielding Xriit) = g\ _■ . The algorithm never 

chooses an augmenting path with capacity, so we 
will never get a zero denominator. 

Analogous damping is applied in the opposite direc- 
tion. This dynamic damping will then produce the 
same message increments in the forward and backward 
direction, which will be a key property used in later 
analysis. 

SCHEDULE Implementation: The combination 

of potentials and messages on the edges contain the 
same information as the residual capacities in the 
graph cuts residual graph. Using this equivalence, any 
algorithm for finding augmenting paths in the graph 
cut setting can be used to find chains to pass messages 
on for MP. The terms being minimized over in Eq. (12) 
are residual capacities, which are defined in terms of 
messages and potentials. Specifically, at the end of 
any iteration of the MP algorithm described in the 
next section, the residual capacities of edges between 
non-terminal nodes can be constructed from potentials 



Algorithm 1 Augmenting Paths Max-Product 

/(O) ^ 00 
t <(- 

while f{t) > do {Phase 1} 
T{t),f{t) ^ SCHEDULE(J-g(t)) 
Xr,{t),XrAt) ^ DAMPING(7-g(i),r(i),/(t)) 
TGit+l) ^ MP{J^g{t),£^''iT{t)),Xr,{t),^TAt)) 

end while 

while not converged do {Phase 2} 

Run Strict MP 
end while 



and current messages m as follows: 

nj = - m^,^0,, (1) + m^.^e,,. (0). (14) 

The difference in messages ma;._>eij (1) — iTixi^eij{0) 
is then equivalent to the difference in flows fij — fji 
in the graph cuts formulation. The residual capacities 
for terminal edges can be constructed from messages 
and potentials related to unary factors: 

r,i = el-me,^,,{l) (15) 
3.2 Phase 2: Strict MP 

When the scheduler cannot find a positive-capacity 
path on which to pass messages, it switches to its sec- 
ond phase and passes all messages at all iterations, 
with no damping i.e.. Strict MP. It continues until 
reaching a fixed point. (We will prove in Section 5 
that if potentials are finite, it will always converge). 
The choice of Strict MP is not essential. We can prove 
the same results for any reasonable scheduling of mes- 
sages. 

4 APMP Phase 1 Analysis 

Assume that at the beginning of iteration t, each vari- 
able Xi e ^r{t) has received an incoming message from 
its left-neighboring factor Qa, ^e^^xtixi) = {a,b)'^. 
We want to show that when each variable receives 
an incremented message, (0,6 -|- f)-^, the increment 
(0, Z)"^ — up to a normalizing constant will be prop- 
agated through the variable and the next factor, Q^j, 
to the next variable on the path. 

The pairwise potential at the next pairwise factor 
along the chain will be Qij. The damping scheme 
ensures that 9lj > a and > b + f. Lemma 1 
shows that under these conditions, factors will propa- 
gate messages unchanged. 

Lemma 1 (Message-Preserving Factors) . When pass- 
ing standard MP messages with the factors as above, 



(^ij ^ o.) o.iT'd (^ij ^ b + f , the outgoing factor-to- 
variable message is equal to the incoming variable- 
to-factor message i.e. niQ-.^xj = 'nixi^Qij and 

Proof. This follows from plugging in values to the mes- 
sage updates. See supplementary materials. □ 

Lemma 1 allows us to easily compute value of all mes- 
sages passed during the execution of Phase 1 of APMP 
and thus the change in beliefs at each variable. 

Corollary 1 (Structured Belief Changes). Before and 
after an iteration t of Phase 1 APMP, the change in 
unary belief at each variable in ^rit) be (0,0)-^, 
up to a constant normalization. 

Proof. Under the APMP damping scheme, the change 
in message from the first unary factor in T{t) will be 
(0, Z)-^, and the change in message from the last unary 
factor in T{t) will be (/, 0)'^ where / is as defined in 
Eq. (12). Without message normalization, these mes- 
sages will propagate unchanged through the pairwise 
factors in T{t) by Lemma 1. Variable to factor mes- 
sages will also propagate the change unaltered. 

Message normalization subtracts a positive constant 
c = min(a, 6 + /) from both entries in a message vec- 
tor. Existing message values will only get smaller, 
so the message-preserving property of factors will be 
maintained. Thus, each variable will receive a message 
change of (— cl- / — <'l)'^ from the left and a message 
change of (/ — cr, —cr)^ from the right. The total 
change in belief is then {f — cl — cr, f — cl — cr)'^, 
which completes the proof. □ 

Fig. 1 illustrates the structured message changes. 
4.1 Message Free View 

Here, using the reparametrization view of max- 
product from Wainwright et al. (2004), we analyze the 
equivalent "message-free" version of the first phase of 
APMP one that directly modifies potentials rather 
than sending messages. Corollary 1 shows that all 
messages in APMP can be analytically computed. We 
then use these message values to compute the change 
in parameterization due to the messages at each itera- 
tion. The main result in this section is that this change 
in parameterization is exactly equivalent to that per- 
formed by graph cuts. 

An important identity, which is a special case of the 
junction tree representation (Wainwright et al., 2004), 
states that we can equivalently view MP on a tree as 



reparameterizing © according to beliefs b: 

+ ^ Qij{xi,Xj) 
iev ijes 

= X] ^'(^i) + [bij{xi, Xj) - bi{xi) - bj{xj)] (17) 
iev ijes 

where O is a reparametrization i.e. -E(x; &) = E{x; O) 
Vx. At any point, we can stop and calculate cur- 
rent beliefs and apply the reparameterization (i.e., re- 
place original potentials with reparameterized poten- 
tials and set all messages to 0). This holds for damped 
factor graph max-product even if factor to variable 
messages are damped. 

"Used" and "Remainder" Energies: To ana- 
lyze reparameterizations, we begin by splitting E into 
two components: a part that has been used so far, 
and a remainder part. The used part is defined as 
the energy function that would have produced the cur- 
rent messages if no damping were used. The remain- 
der is everything else. Since damping is only applied 
at unary potentials, we assign all pairwise potentials 
to the used component: &'^^\xi,Xj) = Qij{xi,Xj). 
The used component of unary potentials can easily 
be defined as the current message leaving the factor: 
Qf^\xi) = ms-^xiixi). Consequently, the remainder 
pairwise potentials are zero, and the remainder unary 
potentials are Q\^'\xi) = Qi{xi) — Qf^\xi). We apply 
the message-free interpretation to get a reparameter- 
ized version of E{x.; 6^^^) then add in the remainder 
component of the energy unmodified. 

Analyzing Beliefs: The parameterization in Eq. (17) 
depends on unary and pairwise beliefs. We consider 
the change in beliefs from that defined by messages at 
the start of an iteration of APMP to that defined by 
messages at the end of an iteration. There are three 
cases to consider. 

Case 1 Variables and potentials not in or neighboring 

^T(t) ''^ill ii'^t have any potentials or adjacent beliefs 
changed, so the reparametrization will not change. 

Case 2 Potentials neighboring x G ^T(t) but not in 
£^'^{T{t)) could possibly be affected by the belief at 
a variable in xt-^^), since the belief at an edge depends 
on the beliefs at variables at each of its endpoints. 
However, by Corollary 1, after applying standard nor- 
malization, this belief does not change after a forward 
and backward pass of messages, so overall they are 
unaltered. 

Case 3 We now consider the belief of potentials 

Qij"^ G ^nO' '^l^i^ ^^^"^ most involved case, where 
the parametrization does change, but it does so in a 
very structured way. 



Lemma 2. The change in pairwise belief on the cur- 
rent augmenting path T{t) from the beginning of an 
iteration t to the end of an iteration is 



-/ 

+/ 



+ / ijeT{t). (18) 



Proof. This follows from applying the standard repa- 
rameterization (17) to messages before and after an 
iteration of Phase 1 APMP. See supplementary mate- 
rial for details. □ 

Unary Reparameterizations: As discussed above, 
the used part of the energy is grouped with messages 
and reparamcterized as standard, while the remainder 
part is left unchanged and is added in at the end: 



e,{x,) = b,{x,;e'^''^) + ef\xi). 



(19) 



Parameterizations defined in this way are proper repa- 
rameterizations of the original energy function. 

Lemma 3. The changes in parameterization during 

iteration t of Phase 1 APMP at variables xji and x^^^ 
respectively are (0, —fY^ and (— /, 0)"^. The change in 
all other unary potentials is (0,0)-^. 

Proof. The Phase 1 damping scheme ensures that the 
message leaving the first factor on T = T{t) is in- 
cremented by (0, f)^ . This means that e^'(a;ri) IS 
incremented by (0,/)-^, so Q^-^ [xr^) is decremented 
by (0, fY^ to maintain the decomposition constraint. 
Unary beliefs do not change, so the new parameteriza- 
tion is then AGti {^Ti ) = '^^xn {'^Ti ) + AOi^j {xr^ ) = 
(0, —fY- A similar argument holds for A©^^^ . 

The only binary potentials involved in an iteration of 
APMP are endpoints of T(t), so no other 8^^^ values 
will change. The total change in parameterization at 
non-endpoint unary potentials is then (0,0)-^. □ 

Full Reparameterizations: Finally, we are ready 
to prove our first main result. 

Theorem 1. The difference between two 
reparametrizations induced by the messages in 
Phase 1 APMP, before and after passing messages on 
the chain corresponding to augmenting path T{t), is 
equal to the difference between reparametrizations of 
graph cuts before and after pushing flow through the 
equivalent augmenting path. 

Proof. The change in unary parameterization is given 
by Lemma 3. The change in pairwise parameterization 
is AQij{xi,Xj) = Abij{xi,Xj) - Abi{xj,) - Abj{xj) = 
j), where Abij{xt,Xj) is given by Lemma 2. 





(a) 



Figure 2: Illustration of first two "used" energies and 
associated fixed points constructed by APMP on the 
problem from Fig. 4. Potentials S^*^) are given in 
square brackets. Messages have no parentheses. Edges 
with messages equal to (0,0), and pairwise potentials, 
which are assumed strong, are not drawn to reduce 
clutter, (a) First energy, (b) Second energy. Note 
that both sets of messages give a max-product fixed 
point for the respective energy. 

Putting the two together, we see that the changes in 
potential entries are exactly the same as those per- 
formed by graph cuts in (3) - (7): 
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This completes the proof of equivalence between Phase 
1 APMP and Phase 1 of graph cuts. □ 

Fig. 2 shows G*^*^^ and Phase 1 APAfP messages from 
running two iterations on the example from Fig. 4. 

5 APMP Phase 2 Analysis 

We now consider the second phase of APMP. Through- 
out this section, we will work with the reparamcterized 
energy that results from applying the equivalent repa- 
rameterization view of MP at the end of APMP Phase 
1 — that is, we have applied the reparameterization to 
potentials, and reset messages to 0. All results could 
equivalently be shown by working with original po- 
tentials and messages at the end of Phase 1, but the 
chosen presentation is simpler. 

At this point, there are no paths between a unary po- 
tential of the form (0, a)"^,a > and a unary poten- 




Figure 3: A decomposition into three homogeneous 
islands. The left-most and right-most islands have be- 
liefs of the form {a,Q)'^, while the middle has beliefs 
of the form {0,(3)"^. Non-touching cross-island lines 
indicate that messages passed from one island to an- 
other will be identically after any number of internal 
iterations of message passing within an island. 

tial of the form {b,0)'^,b > with nonzero capacity. 
Practically, as in graph cuts, breadth first search could 
be used at this point to find an optimal assignment. 
However, we will show that running Strict MP leads to 
convergence to an optimal fixed point. This proves the 
existence of an optimal MP fixed point for any binary 
submodular energy and gives a constructive algorithm 
(APMP) for finding it. 

Our analysis relics upon the reparameterization at the 
end of Phase 1 defining what we term homogeneous 
islands of variables. 

Definition 2. A homogeneous island is a set of vari- 
ables connected by positive capacity edges such that 
each variable Xi G xh has normalized beliefs {ai^fii)^ 
where either Wi.ai — orMi.Pi = 0. Further, after any 
number of rounds of message passing amongst vari- 
ables within the island, any message mQ. {xj) from 
a variable inside the island Xi to a variable outside the 
island Xj is identically 0, and vice versa. 

Call the variables inside a homogeneous island with 
nonzero unary potentials seeds of the island. Fig. 3 
shows an illustration of homogeneous islands. Homo- 
geneous islands allow us to analyze messages indepen- 
dently within each island, without considering cross- 
island messages. 

Lemma 4. At the end of Phase 1, the messages of 
APMP define a collection of homogeneous islands. 

Proof. This is essentially equivalent to how the max- 
flow min-cut theorem proves that the Ford-Fulkerson 
algorithm has found a minimum cut when no more 
augmenting paths can be found. The boundaries be- 
tween islands are the locations of the cuts. See sup- 
plementary material. □ 

Lemma 4 lets us analyze Strict MP independently 
within each homogeneous island, because it shows that 
no non-zero messages will cross island boundaries. 



Thus, we c;an prove that internally, each island will 

reach a MP fixed point: 

Lemma 5 (Internal Convergence). Iterating Strict 
MP inside a homogeneous island of the form [a, 0)^ 

(or (0, 0)'^) ) will lead to a fixed point where beliefs are 
of the form (a^,0)^,< > (or (0,A)^,^i > O; at 
each variable in the island. 

Proof. (Sketch) We prove the case where the unary po- 
tentials inside the island have form (0:^,0)'^. The case 
where they have form (0, /3i)^ is entirely analogous. 

At the beginning of Phase 2, all unary potentials will 
be of the form (a, 0)-^, a > 0. By the positive-capacity 
edge connectivity of homogeneous islands property, 
messages of the form (Q;,0)"^,a > will eventually 
be propagated to all variables in the island by Strict 
MP. In addition, messages can only reinforce (and not 
cancel) each other. For example, in a single loop ho- 
mogeneous island, messages will cycle around the loop, 
getting larger as unary potentials are added to incom- 
ing messages and passed around the loop. Messages 
will only stop growing when the the variable-to-factor 
messages become stronger than the pairwise potential. 

On acyclic island structures. Strict MP will obviously 

converge. On loopy graphs, messages will be monoton- 
ically increasing until they are capped by the pairwise 
potentials (i.e., the pairwise potential is saturated). 
The rate of message increase is lower bounded by some 
constant (that depends on the strength of unary po- 
tentials and size of loops in the island graph, which are 
fixed) , so the sequence will converge when all pairwise 
potentials are saturated. □ 

We can now prove our second main result: 

Theorem 2 (Guaranteed Convergence and Optimal- 
ity of APMP Fixed Point). APMP converges to an 
optimal fixed point on binary submodular energy func- 
tions. 

Proof. After running Phase 2 of APMP, Lemma 5 
shows that each homogeneous island will converge to 
a fixed point where beliefs at all variables in the island 
can be decoded to give the same assignment as the 
initial seed of the island. This is the same assignment 
as the optimal graph cuts-style connected components 
decoding would yield. Cross-island messages are all 
zero, and if a variable is not in an island, it has zero 
potential, sends and receives all zero messages, and 
can be assigned arbitrarily. Thus, we are globally at 
a MP fixed point, and beliefs can be decoded at each 
variable to give the optimal assignment. □ 

Finally, we return to the canonical example used to 
show the suboptimality of MP on binary submodular 




(a) Bad Fixed Point (b) Optimal Fixed Point 



Figure 4: The canonical counterexample used to show 
that MP is suboptimal on binary submodular energy 
functions. Potentials arc given in square brackets. 
Messages have no parentheses. Pairwise potentials are 
symmetric with strength A, and A > 2a > 26, making 
the optimal assignment (1,1,1,1). (a) The previously 
analyzed fixed point. Beliefs at 1 and 4 are (a, 2A)-^, 
and at 2 and 3 are (0, 6 + 3A — 2a)^, which gives a sub- 
optimal assignment, (b) We introduce a second fixed 
point. Beliefs at 1 and 4 are (2A+a, 0)^, and at 2 and 3 
are (3A, b)"^, which gives the optimal assignment. Our 
new scheduling and damping scheme guarantees MP 
will find an optimal fixed point like this for any binary 
submodular energy function. 

energies. The potentials and messages defining a sub- 
optimal fixed point, which is reached by certain subop- 
timal scheduling and damping schemes, are illustrated 
in Fig. 4 (a). If, however, we run APMP, Phase 1 ends 
with the messages shown in Fig. 2(b) and Phase 2 con- 
verges to the fixed point shown in Fig. 4 (b). Decoding 
beliefs from the messages in Fig. 4 (b) indeed gives the 
optimal assignment of (1, 1, 1, 1). 

6 Convergence Guarantees 

There are several variants of message passing algo- 
rithms for MAP inference that have been theoretically 
analyzed. There are generally two classes of results: 
(a) guarantees about the optimality or partial opti- 
mality of solutions, assuming that the algorithm has 
converged to a fixed point; and (b) guarantees about 
the monotonicity of the updates with respect to some 
bound and whether the algorithm will converge. 

Notable optimality guarantees exist for TRW algo- 
rithms (Kolmogorov & Wainwright, 2005) and MPLP 
(Globerson & Jaakkola, 2008) . Kolmogorov and Wain- 
wright (2005) prove that fixed points of TRW satis- 
fying a weak tree agreement (WTA) condition yield 



optimal solutions to binary submodular problems. 
Globerson and Jaakkola (2008) show that if MPLP 
converges to beliefs with unique optimizing values, 
then the solution is optimal. 

Convergence guarantees for message passing algo- 
rithms are generally significantly weaker. MPLP is 
a coordinate ascent algorithm so is guaranteed to con- 
verge; however, in general it can get stuck at subopti- 
mal points where no improvement is possible via up- 
dating the blocks used by the algorithm. Somewhat 
similarly, TRW-S is guaranteed not to decrease a lower 
bound. In the limit where the temperature goes to 0, 
convexified sum-product is guaranteed to converge to a 
solution of the standard linear program relaxation, but 
this is not numerically practical to implement (Weiss 
et al., 2007). However, even for binary submodular 
energies, we arc unaware of results that guarantee con- 
vergence for convexified belief propagation, MPLP, or 
TRW-S in polynomial time. 

Our analysis reveals schedules and message passing up- 
dates that guarantee convergence in low order polyno- 
mial time to a state where an optimal assignment can 
be decoded for binary submodular problems. This fol- 
lows directly from analysis of max-flow algorithms. By 
using shortest augmenting paths, the Edmonds-Karp 
algorithm converges in 0(|V||5p) time (Edmonds & 
Karp, 1972). Analysis of the convergence time of 
Phase 2 is slightly more involved. Given an island with 
a large single loop of M variables, with strong pairwise 
potentials (say strength A) and only one; small nonzero 
unary potential, say {a,0)^, convergence will take on 
the order of time, which could be large. In prac- 
tice, though, we can reach the same fixed point by 
modifying nonzero unary potentials to be (A, 0)"^, in 
which case convergence will take just order M time. 
Interestingly, this modification causes Strict MP to 
become equivalent to the connected components al- 
gorithm used by graph cuts to decode solutions. 

7 Related Work 

There are close relationships between many MAP in- 
ference algorithms. Here we discuss the relationships 
between some of the more notable and similar algo- 
rithms. APMP is closely related to dual block co- 
ordinate ascent algorithms discussed in (Sontag & 
Jaakkola, 2009) Phase 1 of APMP can be seen as 
block coordinate ascent in the same dual. Inter- 
estingly, even though both are optimal ascent steps, 
APMP reparameterizations are not identical to those 
of the sequential tree-block coordinate ascent algo- 
rithm in (Sontag & Jaakkola, 2009) when applied to 
the same chain. 

Graph cuts is also highly related to the Augmenting 



DAG algorithm (Werner, 2007). Augmenting DAGs 
are more general constructs than augmenting paths, 
so with a proper choice of schedule, the Augmenting 
DAG algorithm could also implement graph cuts. 

Our work follows in the spirit of RBP (Elidan et al., 
2006), in that we are considering dynamic schedules 
for belief propagation. RBP is more general, but our 
analysis is much stronger. 

Finally, our work is also related to the COMPOSE 
framework of Duchi et al. (2007). In COMPOSE, spe- 
cial purpose algorithms are used to compute MP mes- 
sages for certain combinatorial-structured subgraphs, 
including binary submodular ones. We show here that 
special purpose algorithms are not needed: the inter- 
nal graph cut algorithm can be implemented purely in 
terms of max-product. Given a problem that contains 
a graph cut subproblem but also has other high order 
or nonsubmodular potentials, our work shows how to 
interleave solving the graph cuts problem and passing 
messages elsewhere in the graph. 

8 Conclusions 

While the proof of equivalence to graph cuts was mod- 
erately involved, the APMP algorithm is a simple spe- 
cial case of MP. The analysis technique is novel: rather 
than relying on the computation tree model for anal- 
ysis, we directly mapped the operations being per- 
formed by the algorithm to a known combinatorial al- 
gorithm. It would be interesting to consider whether 
there are other cases where the MP execution might 
be mapped directly to a combinatorial algorithm. 

We have proven strong statements about MP fixed 
points on binary submodular energies. The analysis 
has a similar flavor to that of Weiss (2000), in that we 
construct fixed points where optimal assignments can 
be decoded, but where the magnitudes of the beliefs 
do not (generally) correspond to meaningful quanti- 
ties. The strategy of isolating subgraphs might apply 
more broadly. For example, if we could isolate single 
loop structiires as we isolate homogeneous islands in 
Phase 1, a second phase might then be used to find 
optimal solutions in non-homogeneous, loopy regions. 

An alternate view of Phase 1 is that it is an intelli- 
gent initialization of messages for Strict MP in Phase 
2. In this light, our results show that initialization can 
provably determine whether MP is suboptimal or op- 
timal, at least in the case of binary submodular energy 
functions. 

The connection to graph cuts simplifies the space of 
MAP algorithms. There are now precise mappings be- 
tween ideas from graph cuts and ideas from belief prop- 
agation (e.g., augmenting path strategies to schedul- 



ing). It allows us, for example, to map the capacity 
scaling method from graph cuts to schedules for mes- 
sage passing. 

A broad, interesting direction of future work is to fur- 
ther investigate how insights related to graph cuts can 
be used to improve inference in the more general set- 
tings of multilabel, nonsubmodular, and high order 
energy functions. At a high level, APMP separates 
the concerns of improving the dual objective (Phase 

1) from concerns regarding decoding solutions (Phase 

2) . In loopy MP, this delays overcounting of messages 
until it is safe to do so. We believe that this and other 
concepts presented here will generalize. We are cur- 
rently exploring the non-binary, non-submodular case. 
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A Supplementciry Material 

Accompanying "Graph Cuts is a 
Max-Product Algorithm" 

We provide additional details omitted from the main 
paper due to space limitations. 

Lemma 1 (Message-Preserving Factors) When pass- 
ing standard MP messages with the factors as above, 

(^ij ^ o.) o,nd > b + f , the outgoing factor-to- 
variable message is equal to the incoming variable- 
to-factor message i.e. rriQ-.^xj = rnxi^&ij and 



Proof. This follows simply from plugging in values to 
the message updates. We show the i to j direction. 
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where the final evaluation of the min functions used 
the assumptions that Ojj > a and Oij >b-\- f. □ 

Lemma 2 The change in pairwise belief on the cur- 
rent augmenting path T{t) from the beginning of an 
iteration t to the end of an iteration is 



^bij {Xi , Xj ) — 
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+ / ij€T{t). (24) 



Proof. At the start of the iteration, message 
nixi^eijixi) — {a,b)'^ for some a,b. As mentioned in 
the proof of Corollary 1, during APMP, rnxj^Oijixj) 
will be incremented by exactly the same values as 
n^xi^eij{xi), except in opposite positions. All mes- 
sages are initialized to 0, so mx^^@,j{xj) = {b,a)^. 
The initial belief is then 
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After passing messages on T(t), ttIx-^q.. (.t,;) = (a, h + 
f)^ and nixj^Oij {xj) = {b + f, a)^. The new belief is 



bij {xi,Xj) 
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Here ki — a + b and K2 — ci + b + f. Subtracting the 
initial belief from the final belief finishes the proof: 
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□ 



Proof. Initially, all beliefs have the form (a^, 0)"^, a,; > 
by definition. Given an incoming message of the form 
(a, 0)-^, a > 0, a submodular pairwise factor will com- 
pute outgoing message {mm{a,6jj),0)'^ , where 9jj > 
0. The minimum of two non- negative quantities is pos- 
itive. Variable to factor messages will sum messages 
of this same form, and the sum of two non-negative 
quantities is non-negative. Thus, all messages passed 
within the island will be of the form {a,0)^,a > 0, 
which beliefs will be of the proper form. Lemma 6 
shows that edges previously defining the boundary of 
the island will still define the boundary of the island. 
The case of incoming message (0, 13)"^ is analogous. □ 

Lemma 4. At the end of Phase 1, the messages of 
APMP define a collection of homogeneous islands. 



Messages at the end of Phase 1 define homoge- 
neous islands: 

We prove that messages at the end of Phase 1 define 
homogeneous islands in two parts: 

Lemma 6 (Binary Mask Property). // a pairwise 
factor Oij computes outgoing message mQ^-^j{xj) = 
(0,0)"^ given incoming message mi^Q..{xi) = (a:,0)^ 
for some a > 0, then it will compute the same (0, 0)"^ 
outgoing message given any incoming message of the 
form, mi^Q..{xi) ~ (a', 0)-^, a' > 0. (The same is 
true of messages with a zero in the opposite position.) 

Proof. This essentially follows from plugging in values 
to message update equations. Suppose mi^Q^.{xi) = 
(a,0)^ and mQ.^^j{xj) = (0,0)^. Plugging into the 
message update equation, we see that, 

tn@..^:,. {xj) = min [@ij{xi, xj) + m^.^e^j (a;,)] 

= min [el^ ■ Xi{l - Xj) + e^j ■ (1 - Xi)xj 

+ a-{l- Xi)] 
me„^.x,(0) = min(6i/j',a) 
me,,^x,(l) = min(a + ^°/,0)=0 

In order for this to evaluate to (0, 0)'^ when a > 0, 
9^} must be 0. Since 9^} = 0, no matter what value 
of a' > we are given, it is clear that viAn{9]j ,a') = 
0. □ 

Lemma 7 (Iterated Homogeneity). Homogeneous is- 
lands of type (a, 0) ( or (0, /3) ) are closed under passing 
Strict MP messages between variables in the island. 
That is, a variable thai starts with belief {a, 0)'^ , a > 
will have belief {a',Q)^,a' > after any number of 
rounds of message passing. 



Proof. (Sketch) This is essentially equivalent to the 
max-flow min-cut theorem, which proves the optimal- 
ity of the Ford-Fulkerson algorithm when no more aug- 
menting paths can be found. In our formulation, at the 
end of Phase 1, there are by definition no paths with 
nonzero capacity, which implies that along any path 
between a variable i with belief {a,0)'^,a > and a 
variable k with belief (0,/3)"^,/3 > 0, there must be 
a factor-to- variable message that given incoming mes- 
sage (a,0)^,a > would produce outgoing message 
(0,0)-^. (This is similarly true of opposite direction 
messages.) 

Thus, to define the islands, start at each variable will 
nonzero belief, say of the form (a,0)-^, and search 
outwards by traversing each edge iff it would pass 
a nonzero message given incoming message (a,0)^. 
Merge all variables encountered along the search into 
a single homogeneous island. □ 



